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1. What is mediation analysis? 

In mediation, we consider an intermediate variable, 
called the mediator, that helps explain how or why 
an independent variable influences an outcome. In 
the context of a treatment study, it is often of great 
interest to identify and study the mechanisms by which 
an intervention achieves its effect. By investigating 
mediational processes that clarify how the treatment 
achieves the study outcome, not only can we further 
our understanding of the pathology of the disease and 
the mechanisms of treatment, but we may also be 
able to identify alternative, more efficient, intervention 
strategies. For example, a tobacco prevention program 
may teach participants how to stop taking smoking 
breaks at work (the intervention) which changes their 
social norms about tobacco use (the intermediate 
mediator) and subsequently leads to a reduction in 
smoking behavior (study outcome).' 11 

With mediation analysis, we gain insight and 
acquire deep understanding about the mechanism 
of action of pharmacological and psychotherapeutic 
treatments. Such information provides an added 
dimension to understand the etiology of disease and 
the pathways of therapeutic effects, which can stimulate 
the identification of more efficacious and cost-efficient 
alternative therapies. 

2. What is structural equation modeling? 

Structural equation modeling (SEM) is a very general, 
very powerful multivariate technique. It uses a 
conceptual model, path diagram and system of linked 
regression-style equations to capture complex and 
dynamic relationships within a web of observed and 
unobserved variables. Although similar in appearance, 
SEM is fundamentally different from regression. In 
a regression model, there exists a clear distinction 
between dependent and independent variables. In SEM, 
however, such concepts only apply in relative terms 
since a dependent variable in one model equation can 
become an independent variable in other components of 



the SEM system. It is precisely this type of reciprocal 
role a variable plays that enables SEM to infer causal 
relationships. 

SEM models include both endogenous and 
exogenous variables. Endogenous variables act as 
a dependent variable in at least one of the SEM 
equations; they are called endogenous variables rather 
than response variables because they may become 
independent variables in other equations within 
the SEM equations. Exogenous variables are always 
independent variables in the SEM equations. SEM 
equations model both the causal relationships between 
endogenous and exogenous variables, and the causal 
relationships among endogenous variables. 

SEM models are best represented by path diagrams. 
A path diagram consists of nodes representing the 
variables and arrows showing relations among these 
variables. By convention, in a path diagram latent 
variables (e.g., depression) are represented by a circle 
or ellipse and observed variables (e.g., a score on a 
rating scale) are represented by a rectangle or square. 
Arrows are generally used to represent relationships 
among the variables. A single straight arrow indicates a 
causal relation from the base of the arrow to the head 
of the arrow. Two straight single-headed arrows in 
opposing directions connecting two variables indicate 
a reciprocal causal relationship. A curved two-headed 
arrow indicates there may be some association between 
the two variables. Error terms for a variable are inserted 
into the path diagram by drawing an arrow from the 
value of the error term to the variable with which the 
term is associated. 

For example, in most path diagrams for cross- 
sectional data, error terms are not connected, indicating 
stochastic independence across the error terms. But if 
we suspect association between error terms - which is 
likely to occur in most longitudinal studies - the error 
terms should be connected by curved two-headed 
arrows. See Bollen' 21 and Kowalski and Tu [31 for more 
details about modeling complex relationships involving 
latent constructs using SEM. 
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3. Advantages of using structural equation modeling 
instead of standard regression methods for 
mediation analysis 

Baron and Kenny/ 41 in the first paper addressing mediation 
analysis, tested the mediation process using a series 
of regression equations. However, mediation assumes 
both causality and a temporal ordering among the three 
variables under study (i.e. intervention, mediator and 
response). Since variables in a causal relationship can 
be both causes and effects, the standard regression 
paradigm is ill-suited for modeling such a relationship 
because of its a priori assignment of each variable as 
either a cause or an effect. 11,5,61 Structural equation 
modeling (SEM) provides a more appropriate inference 
framework for mediation analyses and for other types 
of causal analyses. 

There are many advantages to using the SEM 
framework in the context of mediation analysis. When 
a model contains latent variables such as happiness, 
quality of life and stress, SEM allows for ease of 
interpretation and estimation. SEM simplifies testing of 
mediation hypotheses because it is designed, in part, 
to test these more complicated mediation models in 
a single analysis.' 71 SEM can be used when extending a 
mediation process to multiple independent variables, 
mediators or outcomes. This contrasts with standard 
regression, in which ad hoc methods must be used for 
inference about indirect and total effects. 14,8,91 These 
ad hoc methods rely on combining the results of two 
or more equations to derive the asymptotic variance. 
This is especially problematic when there are different 
numbers of observations missing in the different 
regression equations representing a mediation process. 
Also, in standard regression, we handle missing data via 
listwise deletion since there is no built-in missing data 
mechanism when using ordinary least squares (OLS). 

Another important advantage of SEM over 
standard regression methods is that the SEM analysis 
approach provides model fit information about the 



consistency of the hypothesized mediational model 
to the data and evidence for the plausibility of the 
causality assumptions 110,111 made when constructing the 
mediation model. The standard regression procedure 
initially recommended by Baron and Kenny 141 has also 
been shown to be low powered. 171 Moreover, unlike 
standard regression approaches, SEM allows for ease of 
extension to longitudinal data within a single framework, 
corresponding with a study's conceptual framework 
for clear hypothesis articulation. 1121 Finally, Bollen and 
Pearl 1101 note that even when the same equation is 
used in SEM and in regression analysis, the results will 
be different because they are based on completely 
different assumptions. Standard regression analysis 
implies a statistical relationship based on a conditional 
expected value, while SEM implies a functional 
relationship expressed via a conceptual model, path 
diagram, and mathematical equations. Thus, the causal 
relationships in a hypothesized mediation process, the 
simultaneous nature of the indirect and direct effects, 
and the dual role the mediator plays as both a cause for 
the outcome and an effect of the intervention are more 
appropriately expressed using structural equations than 
using regression analysis. 

4. Use of SEM for mediation analysis 

Figure 1 shows a path diagram for the causal 
relationships between the three variables in the 
smoking prevention example discussed earlier: 
prevention program (x), social norm (z), and amount 
of smoking (y). In this example, all variables that are 
effected by other variables - social norms and amount 
of smoking - are endogenous variables, while variables 
that only impart an effect on other variables without 
being effected by other variables - the prevention 
program - are exogenous variables. All three variables 
in this smoking prevention example are assumed to 
be all observed so rectangles (not circles) are used to 
represent the variables. 



Figure 1: Pathway of a mediation process for a tobacco prevention program 
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The SEM for this mediation model for the ;'th 
subject (1 < ;' < n) is given by: 

z~6 0 + 6^+ e zi , 

We assume the error terms (e z „e yl ) are uncorrected, 
an important assumption for causal inference in 
performing mediation analysis.' 10111 We also assume 
multivariate normality for the error terms; this is a 
necessary underlying condition of the definition of 
direct, indirect and total effects. Note that the two 
structural equations are linked together and inference 
about them is simultaneous, unlike two independent 
standard regression equations. 

The direct effect is the pathway from the exogenous 
variable to the outcome while controlling for the 
mediator. Therefore, in our path diagram y xy is the direct 
effect. The indirect effect describes the pathway from 
the exogenous variable to the outcome through the 
mediator. This path is represented through the product 
of 6 XZ and y zy . Finally, the total effect is the sum of the 
direct and indirect effects of the exogenous variable on 
the outcome, y xy + 6 xz y zy . 

The primary hypothesis of interest in a mediation 
analysis is to see whether the effect of the independent 
variable (intervention) on the outcome can be 
mediated by a change in the mediating variable. In a full 
mediation process, the effect is 100% mediated by the 
mediator, that is, in the presence of the mediator, the 
pathway connecting the intervention to the outcome is 
completely broken so that the intervention has no direct 
effect on the outcome. In most applications, however, 
partial mediation is more common, in which case 
the mediator only mediates part of the effect of the 
intervention on the outcome, that is, the intervention 
has some residual direct effect even after the mediator 
is introduced into the model. 

In terms of testing the primary hypothesis of 
interest, we start by examining a reduced regression 
equation without the mediator: 

y i =y* 0 + y* xy x i +s* yi 

If we accept the null hypothesis (H 0 : y* xy =0) for this 
reduced regression equation, then x and y (i.e., the 
intervention and the outcome) are not related and 
we should not consider potential mediators. We then 
proceed to evaluate the SEM for the mediation model if 
we reject the null hypothesis for this reduced regression 
equation. Full mediation (i.e., the intervention has no 
direct effect on the outcome) corresponds to the null 
hypothesis, H 0 : y xy =0. If this null is rejected, it becomes 
of interest to assess partial mediation via the direct, 
indirect and total effects. Inference (standard errors and 
p-values) about such effects is easily performed using 
the Delta or Bootstrap methods. 18,9131 

Significant advances have been made over the past 
few decades in the theory, applications and associated 
software development for fitting SEM models that 
can be used in the context of mediation analysis. For 



example, in addition to specialized packages such as 
LISREL,' 141 MPIus, 1151 EQS, [161 and Amos,' 171 procedures for 
fitting SEM are also available from general-purposes 
statistical packages such as R, SAS, STATA and Statistica. 
These packages provide inference based on maximum 
likelihood, generalized least squares, and weighted least 
squares. 

5. An example of mediation analysis using SEM to 
model the relationship of drinking to suicidal risk 

Project MATCH 1181 is a multisite treatment trial for alcohol 
use disorders that enrolled 1,726 participants (including 
24% women) with a mean (sd) age of 40.2 (11.0) years. 
Previously, studies of alcohol dependent individuals 
established that drinking promotes depressive 
symptoms and depressive disorders and that depression 
is an important risk factor for suicidal thoughts and 
behavior. 1191 Therefore, considering the context of the 
study and prior theory, mediation analysis was used to 
evaluate the hypothesis that greater drinking intensity 
leads to higher levels of depression which, in turn, leads 
to suicidal ideation. 1191 In the model, drinking intensity 
was a latent construct based on three months of data 
about drinking behavior, while depression and suicidal 
ideation were measured using the Beck Depression 
Inventory. 1201 

Mediation analysis with SEM was performed using 
MPIus software. Age, gender, race, treatment assignment, 
study arm, and baseline percent days abstinent were 
controlled for in the structural equations for each 
endogenous variable in the structural model. The 
outcome - the presence or absence of suicidal ideation 
- was analyzed via the probit link (which is used to 
transform outcome probabilities to the standard 
normal variable), which made it possible to interpret 
the indirect, direct and total effects on an interval 
scale. Subjects were assessed at baseline and at 3-, 
9-, and 15-month follow-up, but in order to derive 
a single direct, indirect and total effect in the model 
(as in models of cross-sectional data) we constrained 
all model parameters at the three follow-up times to 
be equal and controlled for the baseline value of the 
outcome measure. Standardized estimates (between -1 
and 1) were reported rather than raw estimates, so that 
estimates from different structural equations are on the 
same scale, simplifying interpretation. 

In the regression equation without the mediator, 
the estimate of the causal path from drinking intensity 
to suicidality was significant (y* xy =0.20, p<0.001). 

The path diagram of Figure 2 of the mediation 
model includes the standardized estimates for the 
causal paths for the indirect and direct effects. Both 
estimated paths for the indirect effect were statistically 
significant, while the estimate of the direct effect (y xy ) 
from drinking intensity to suicidal ideation was close 
to zero and not significant. Therefore, potentially, 
depression fully mediates the path between drinking 
intensity and suicidal ideation. The model showed 
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reasonably good model fit according to multiple SEM fit 
statistics and indices: X 2 (d/=59)=218.29, p<0.001; Root 
Mean Square Error of Approximation (RMSEA)=0.042; 
Comparative fit index (CFI)=0.947; Tucker-Lewis index 
(TLI)=0.933. Rule of thumb guidelines are that CFI >0.95, 
TLI >0.95 and RMSEA <0.05 represent a good fitting 
model. 



Figure 2: Pathway of a mediation process for a 

clinical model of drinking and suicidal risk 
(*p<0.05) 




6. Other issues to consider when performing mediation 
analysis 

Baron and Kenny' 41 distinguished mediation from 
moderation, in which a third variable affects the 
strength or direction of the relationship between an 
independent variable and an outcome. In multi-group 
analyses a moderator is typically either part of an 
interaction term or a grouping variable. For example, 
if males are known to react differently than females 
to a particular intervention for lowering cholesterol, in 
a gender by treatment interaction effect, gender is a 
moderator. In mediated-moderation, such an interaction 
is used as an independent (i.e., exogenous) variable in 
the SEM path diagram. 

Longitudinal data help capture both within- 
individual dynamics and between individual differences 
over time. Also, longitudinal data allow for the 
examination of whether changes in the mediator 
are more likely to precede changes in the outcome, 
presenting more accurate representations of the 
temporal order of change over time that lead to more 
accurate conclusions about mediation.' 71 Latent growth 
modeling is an SEM extension for longitudinal data that 
can flexibly evaluate mediating relationships between 
multiple time-varying measures.' 121 Autoregressive and 
multilevel models have also been used for longitudinal 
mediation analyses with SEM. 

Causal inference methods, which use the language 
of counterfactuals and potential outcomes, have been 
used in mediation analysis.' 211 These approaches address 
the issues of potential confounders of the mediator- 
outcome relationship and of potential interactions 
between the mediator and treatment. They also 
provide definitions for deriving effects for analyses 
involving mediators and outcomes that are not on 



an interval scale (i.e. count data, categorical data). 
These causal inference methods can be applied in the 
SEM framework.' 22,231 Imai and colleagues' 111 proposed 
approaches to extend SEM by using causal inference 
methods to generate a more general definition, 
identification, estimation, and sensitivity analysis of 
causal mediation effects that are not based on any 
specific statistical model; they also introduced a R 
package for performing causal mediation analysis using 
their approaches.' 111 

7. Conclusion 

Mediation helps explain the mechanism through which 
an intervention influences an outcome and assumes 
both causal and temporal relations. When performed 
using strong prior theory and with appropriate 
context, mediation analysis helps provide a focus for 
future intervention research so more efficacious and 
cost-efficient alternative therapies may be developed. 
Structural equation modeling provides a very general, 
flexible framework for performing mediation analysis. 
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